{ "cells": [ { "metadata": {}, "cell_type": "markdown", "source": [ "# Pandas code cards\n", "## Try me\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ffraile/computer_science_tutorials/blob/main/source/Data%20Manipulation/exercises/Pandas%20code%20cards.ipynb)[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ffraile/computer_science_tutorials/main?labpath=source%2FData%20Manipulation%2Fexercises%2FPandas%20code%20cards.ipynb)\n", "\n", "## How to use\n", "- Each card mirrors an A4 classroom prompt. **Predict first** (or discuss), then run the cell to check.\n", "- Detective cards show a buggy idea in Markdown; the code cell shows a **fixed** version.\n", "- Keep explanations short and schematic (*what* → *why*).\n", "\n", "### Turn Gemini into a coding tutor (no direct answers)\n", "Paste this in your first chat with Gemini to keep it in “tutor mode”:\n", "\n", "```markdown\n", "You are a **coding tutor** for Python in Jupyter/Colab. Follow the **course motto** “do not give up learning.”\n", "\n", "### Role & Goals\n", "- Use **Socratic guidance** and **test-first thinking** to help me solve problems myself.\n", "- Help me read errors, reason about state, and make small, safe iterations.\n", "\n", "### Strict Rules\n", "1) **Do not** provide full working solutions or paste complete functions/programs.\n", " - You may show **tiny illustrative fragments (≤3 lines)** or **pseudo-code with TODOs**, but not a drop-in answer.\n", "2) Prefer **questions over answers**; offer **one small next step** at a time.\n", "3) When debugging, explain **what the traceback says**, give **2–3 hypotheses**, and propose the **smallest diff** in *plain English* first.\n", "4) Encourage **TDD**: ask me to write/assert a test, predict, run, and report outputs.\n", "5) Keep responses concise (≈120–150 words) unless I ask for a deeper explanation or code review.\n", "6) Ask me to **run code and share results**; adapt based on the output.\n", "7) If I request the full solution, remind me of the rules and offer a **higher-tier hint** instead.\n", "8) When I finalize an exercise, reinforce learning lessons and suggest additional exercises\n", "\n", "### Interaction Loop (use this structure)\n", "- **Restate goal:** what I’m trying to accomplish in one line.\n", "- **Diagnose:** key assumption to check or error to interpret.\n", "- **Hint (tiered):**\n", " - Tier 1: Conceptual nudge (no code).\n", " - Tier 2: Directed hint (identify line/construct to change).\n", " - Tier 3: Pseudo-code with TODOs or a **1–3 line** pattern (still not a full solution).\n", "- **Next action:** one concrete step for me to try now.\n", "- **Ask back:** what to run/paste (output, test result, or traceback).\n", "\n", "### When reviewing my code\n", "- Comment on **correctness, clarity, naming, and complexity (big-O)**.\n", "- Suggest **tests** I’m missing (boundaries, empty cases, error paths).\n", "\n", "### Safety & Ethics\n", "- No secrets or private data in prompts.\n", "- avoid library functions/APIs unless I ask.\n", "\n", "Stay in tutor mode for the whole session.\n", "```\n", "\n", "## Code Cards\n", "1. Predict the output of this code:\n", "```python\n", "import pandas as pd\n", "data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}\n", "df = pd.DataFrame(data)\n", "print(df.iloc[0, 1])\n", "```\n" ], "id": "a669349e86dafe05" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "2034110f982772d8" }, { "metadata": {}, "cell_type": "markdown", "source": [ "2. Predict the output of this code:\n", "```python\n", "import pandas as pd\n", "data = {'X': [10, 20, 30], 'Y': [40, 50, 60], 'Z': [70, 80, 90]}\n", "df = pd.DataFrame(data)\n", "print(df[['X', 'Z']])\n", "```\n" ], "id": "7656bd3be66abbb5" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "96d12c633a013576" }, { "metadata": {}, "cell_type": "markdown", "source": [ "The next exercises use the dataframe ```df``` below, loading Diabetes data from a CSV file, and has the following columns:\n", "\n", "* Pregnancies: Number of times pregnant\n", "* Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test\n", "* BloodPressure: Diastolic blood pressure (mm Hg)\n", "* SkinThickness: Triceps skin fold thickness (mm)\n", "* Insulin: 2-Hour serum insulin (mu U/ml)\n", "* BMI: Body mass index (weight in kg/(height in m)^2)\n", "* DiabetesPedigreeFunction: Diabetes pedigree function\n", "* Age: Age (years)\n", "* Outcome: Class variable (0 or 1)\n", "* 268 of 768 are 1, the others are 0\n", "* Class Distribution: (class value 1 is interpreted as \"tested positive for diabetes\")\n", "\n", "The following code loads the dataset into a Pandas dataframe and shows the first 5 rows:\n", "\n" ], "id": "e2f1fc736b948087" }, { "metadata": { "ExecuteTime": { "end_time": "2025-12-25T18:44:56.558283Z", "start_time": "2025-12-25T18:44:55.813155Z" } }, "cell_type": "code", "source": [ "import pandas as pd\n", "df = pd.read_csv('https://raw.githubusercontent.com/ffraile/computer_science_tutorials/main/source/Data%20Manipulation/exercises/datasets/diabetes.csv')\n", "df.head()" ], "id": "7e43825c9dabc0c1", "outputs": [ { "data": { "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50 1 \n", "1 0.351 31 0 \n", "2 0.672 32 1 \n", "3 0.167 21 0 \n", "4 2.288 33 1 " ], "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.627501
11856629026.60.351310
28183640023.30.672321
318966239428.10.167210
40137403516843.12.288331
\n", "
" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 1 }, { "metadata": {}, "cell_type": "markdown", "source": [ "3. Complete the code below to calculate the average BMI of patients with diabetes (Outcome == 1) and without diabetes (Outcome == 0).\n", "```python\n", "df_diabetes = df[?] # Filter rows where Outcome == 1\n", "avg_bmi_diabetes = df_diabetes['BMI'].mean()\n", "df_no_diabetes = df[?] # Filter rows where Outcome == 0\n", "avg_bmi_no_diabetes = df_no_diabetes['BMI'].mean()\n", "print(\"Average BMI of patients with diabetes:\", avg_bmi_diabetes)\n", "print(\"Average BMI of patients without diabetes:\", avg_bmi_no_diabetes)\n", "```\n", "\n" ], "id": "2d0b736b2279c84" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "edcf967431752fb" }, { "metadata": {}, "cell_type": "markdown", "source": [ "4. Complete the code below to find cases where the number of pregnancies is greater than 2 and the glucose level is above 150.\n", "```python\n", "filtered_df = df.query(?) # Filter rows where Pregnancies > 2 and Glucose > 150\n", "print(filtered_df.describe())\n", "```" ], "id": "5a784bb190830671" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "bafc8fec86cafdd8" }, { "metadata": {}, "cell_type": "markdown", "source": [ "5. Complete the code below to find cases where patients are older than 50 or have a BMI greater than 30.\n", "\n", "```python\n", "filtered_df = df.query(?) # Filter rows where Age > 50 or BMI > 30\n", "print(filtered_df.describe())\n", "```" ], "id": "b98b80bac80ea541" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "44478569d37423a" }, { "metadata": {}, "cell_type": "markdown", "source": [ "The next exercises use the COVID dataset, which contains information about the COVID-19 pandemic. The dataset contains the following columns:\n", "- Date: Date of the record\n", "- Country: Country name\n", "- Confirmed: Cumulative number of confirmed cases\n", "- Recovered: Cumulative number of recovered cases\n", "- Deaths: Cumulative number of deaths\n", "\n", "\n", "The following code loads the dataset into a Pandas dataframe:" ], "id": "eea68429a4045957" }, { "metadata": { "ExecuteTime": { "end_time": "2025-12-25T18:58:59.640588Z", "start_time": "2025-12-25T18:58:41.261693Z" } }, "cell_type": "code", "source": [ "covid_pd = pd.read_csv('https://raw.githubusercontent.com/ffraile/computer_science_tutorials/main/source/Data%20Manipulation/exercises/datasets/covid.csv')\n", "\n", "covid_pd" ], "id": "f59dadff8bc906b3", "outputs": [ { "data": { "text/plain": [ " Date Country Confirmed Recovered Deaths\n", "0 2020-01-22 Afghanistan 0 0 0\n", "1 2020-01-23 Afghanistan 0 0 0\n", "2 2020-01-24 Afghanistan 0 0 0\n", "3 2020-01-25 Afghanistan 0 0 0\n", "4 2020-01-26 Afghanistan 0 0 0\n", "... ... ... ... ... ...\n", "143663 2022-01-19 Zimbabwe 226887 0 5266\n", "143664 2022-01-20 Zimbabwe 227552 0 5276\n", "143665 2022-01-21 Zimbabwe 227961 0 5288\n", "143666 2022-01-22 Zimbabwe 228179 0 5292\n", "143667 2022-01-23 Zimbabwe 228254 0 5294\n", "\n", "[143668 rows x 5 columns]" ], "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateCountryConfirmedRecoveredDeaths
02020-01-22Afghanistan000
12020-01-23Afghanistan000
22020-01-24Afghanistan000
32020-01-25Afghanistan000
42020-01-26Afghanistan000
..................
1436632022-01-19Zimbabwe22688705266
1436642022-01-20Zimbabwe22755205276
1436652022-01-21Zimbabwe22796105288
1436662022-01-22Zimbabwe22817905292
1436672022-01-23Zimbabwe22825405294
\n", "

143668 rows × 5 columns

\n", "
" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 2 }, { "metadata": {}, "cell_type": "markdown", "source": [ "6. Complete the code below to obtain the time series of Deaths in Spain.\n", "```python\n", "spain_deaths = covid_pd[?] # Filter rows where Country == 'Spain'\n", "print(spain_deaths[['Date', 'Deaths']])\n", "```" ], "id": "26f07e060270c7a8" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "ec866da50b1d5c1a" }, { "metadata": {}, "cell_type": "markdown", "source": [ "7. Complete the code below to extract the Confirmed cases in China in 2020.\n", "```python\n", "china_2020 = covid_pd.query() # Filter rows where Country == 'China' and Date is in 2020\n", "print(china_2020[['Date', 'Confirmed']])\n", "```" ], "id": "e533677ea0d44837" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "916e04582aa38e4" }, { "metadata": {}, "cell_type": "markdown", "source": [ "8. Knowing that the function ```diff()``` computes the difference between consecutive rows, and that ```idmax()``` returns the index of the row with max value, complete the code below to find the date with the highest dead toll in the US.\n", "```python\n", "us_deaths = covid_pd[?].copy() # Filter rows where Country == 'US'\n", "us_deaths['Daily_Deaths'] = us_deaths['Deaths'].diff()\n", "max_death_date = us_deaths.loc[us_deaths['Daily_Deaths'].idxmax()]\n", "print(\"Date with highest daily deaths in US:\", max_death_date['Date'], \"with\", max_death_date['Daily_Deaths'], \"deaths\")\n", "```" ], "id": "677baaba26196a89" }, { "metadata": { "ExecuteTime": { "end_time": "2025-12-25T19:08:18.118049Z", "start_time": "2025-12-25T19:08:18.109202Z" } }, "cell_type": "code", "source": "", "id": "eeef391fa6f4189a", "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Date with highest daily deaths in US: 2021-01-20 with 4442.0 deaths\n" ] } ], "execution_count": 4 }, { "metadata": {}, "cell_type": "markdown", "source": [ "The next exercises use a dataframe that loads data from the following dictionary:\n", "```python\n", "data = [\n", " {\"id\": 101, \"name\": \"Scott Summers\", \"alias\": \"Cyclops\", \"program\":\"CS\", \"score\": 9.5},\n", " {\"id\": 102, \"name\": \"Jean Grey\", \"alias\": \"Phoenix\", \"program\":\"CS\", \"score\": 8.7},\n", " {\"id\": 103, \"name\": \"Logan Howlett\", \"alias\": \"Wolverine\", \"program\":\"CS\", \"score\": 7.8},\n", " {\"id\": 104, \"name\": \"Ororo Munroe\", \"alias\": \"Storm\", \"program\":\"CS\", \"score\": 8.9},\n", " {\"id\": 105, \"name\": \"Charles Xavier\", \"alias\": \"Professor X\", \"program\":\"CS\", \"score\": 10},\n", " {\"id\": 101, \"name\": \"Scott Summers\", \"alias\": \"Cyclops\", \"program\":\"Physics\", \"score\": 10.0},\n", " {\"id\": 102, \"name\": \"Jean Grey\", \"alias\": \"Phoenix\", \"program\":\"Physics\", \"score\": 9.5},\n", " {\"id\": 103, \"name\": \"Logan Howlett\", \"alias\": \"Wolverine\", \"program\":\"Physics\", \"score\": 8.0},\n", " {\"id\": 104, \"name\": \"Ororo Munroe\", \"alias\": \"Storm\", \"program\":\"Physics\", \"score\": 9.1},\n", " {\"id\": 105, \"name\": \"Charles Xavier\", \"alias\": \"Professor X\", \"program\":\"Physics\", \"score\": 10.0}\n", "]\n", "```\n", "The following code loads the dataset into a Pandas dataframe:" ], "id": "2888da3b8063ad6a" }, { "metadata": { "ExecuteTime": { "end_time": "2025-12-25T19:15:44.511453Z", "start_time": "2025-12-25T19:15:44.506267Z" } }, "cell_type": "code", "source": [ "data = [\n", " {\"id\": 101, \"name\": \"Scott Summers\", \"alias\": \"Cyclops\", \"program\":\"CS\", \"score\": 9.5},\n", " {\"id\": 102, \"name\": \"Jean Grey\", \"alias\": \"Phoenix\", \"program\":\"CS\", \"score\": 8.7},\n", " {\"id\": 103, \"name\": \"Logan Howlett\", \"alias\": \"Wolverine\", \"program\":\"CS\", \"score\": 7.8},\n", " {\"id\": 104, \"name\": \"Ororo Munroe\", \"alias\": \"Storm\", \"program\":\"CS\", \"score\": 8.9},\n", " {\"id\": 105, \"name\": \"Charles Xavier\", \"alias\": \"Professor X\", \"program\":\"CS\", \"score\": 10},\n", " {\"id\": 101, \"name\": \"Scott Summers\", \"alias\": \"Cyclops\", \"program\":\"Physics\", \"score\": 10.0},\n", " {\"id\": 102, \"name\": \"Jean Grey\", \"alias\": \"Phoenix\", \"program\":\"Physics\", \"score\": 9.5},\n", " {\"id\": 103, \"name\": \"Logan Howlett\", \"alias\": \"Wolverine\", \"program\":\"Physics\", \"score\": 8.0},\n", " {\"id\": 104, \"name\": \"Ororo Munroe\", \"alias\": \"Storm\", \"program\":\"Physics\", \"score\": 9.1},\n", " {\"id\": 105, \"name\": \"Charles Xavier\", \"alias\": \"Professor X\", \"program\":\"Physics\", \"score\": 10.0}\n", "]\n", "df1 = pd.DataFrame(data)\n", "df1" ], "id": "421be74b6106e117", "outputs": [ { "data": { "text/plain": [ " id name alias program score\n", "0 101 Scott Summers Cyclops CS 9.5\n", "1 102 Jean Grey Phoenix CS 8.7\n", "2 103 Logan Howlett Wolverine CS 7.8\n", "3 104 Ororo Munroe Storm CS 8.9\n", "4 105 Charles Xavier Professor X CS 10.0\n", "5 101 Scott Summers Cyclops Physics 10.0\n", "6 102 Jean Grey Phoenix Physics 9.5\n", "7 103 Logan Howlett Wolverine Physics 8.0\n", "8 104 Ororo Munroe Storm Physics 9.1\n", "9 105 Charles Xavier Professor X Physics 10.0" ], "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnamealiasprogramscore
0101Scott SummersCyclopsCS9.5
1102Jean GreyPhoenixCS8.7
2103Logan HowlettWolverineCS7.8
3104Ororo MunroeStormCS8.9
4105Charles XavierProfessor XCS10.0
5101Scott SummersCyclopsPhysics10.0
6102Jean GreyPhoenixPhysics9.5
7103Logan HowlettWolverinePhysics8.0
8104Ororo MunroeStormPhysics9.1
9105Charles XavierProfessor XPhysics10.0
\n", "
" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 5 }, { "metadata": {}, "cell_type": "markdown", "source": [ "9. Predict the output of this code:\n", "\n", "```python\n", "df1_cs = df1[alias == 'Cyclops']\n", "print(df1_cs)\n", "```\n", "\n" ], "id": "b5b26907d045642a" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "d64de17f97127143" }, { "metadata": {}, "cell_type": "markdown", "source": [ "10. Predict the output of this code:\n", "```python\n", "df1_high = df1[df1['score'] > 9.0]\n", "print(df1_high)\n", "```" ], "id": "638e51fadd882df8" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "ddf3666dc6524f04" }, { "metadata": {}, "cell_type": "markdown", "source": [ "11. Complete the code below to find the average score of students in the CS program.\n", "```python\n", "df1_cs = df1[?] # Filter rows where program == 'CS'\n", "avg_score_cs = df1_cs['score'].mean()\n", "print(\"Average score in CS program:\", avg_score_cs)\n", "```" ], "id": "f2d9a3237dc295c" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "e9acc2cf6d594c45" }, { "metadata": {}, "cell_type": "markdown", "source": [ "12. Complete the code below to find students with a score greater than 9.0 in the Physics program.\n", "```python\n", "df1_physics_high = df1.query(?)# Filter rows where program is Physics and score is greater than 9.0\n", "print(df1_physics_high)\n", "```" ], "id": "89d9920bf365cb52" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "584a0996e7be3c7e" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }